A Brief Discussion on Git Data Structures
All data stored by Git resides within the .git folder. Deleting the .git folder is equivalent to deleting the local version control for that repository. Below are the main contents of the .git folder.
Folders
hooks
Stores various custom scripts that can be executed automatically at specific moments during Git operations, such as before or after commit, push, or merge. These scripts can be used to perform automated tests, check code style, and more. Common hooks include pre-commit, pre-push, and post-merge.
info
Stores auxiliary information files. By default, there is an exclude file used to define rules for excluding specific files or directories. It serves the same purpose as .gitignore, but exclude is a local configuration applied to an individual developer's environment. For team development, you should use .gitignore and track it in version control.
logs
Records the update history of references (such as branches or HEAD). These logs can be used to track who made changes to a branch and when. Common contents include:
- HEAD: Records the history of every HEAD change.
- refs\heads: Stores the change history for each branch.
- \refs\remotes\origin: Stores
git fetchandgit pushrecords for remote branches. "origin" is the default alias for the remote repository linked by the local repository, though other names can be created as needed.
TIP
- The Git command
git reflogdisplays the contents oflogs/HEAD. If you have deleted commit records usinggit reset --hardorgit rebase -i, you can usegit reflogto find the operation history and then usegit reset --hardto restore the commit.
objects
- Purpose: Stores all Git data objects, including blob, tree, commit, and tag objects.
- Structure: Uses the first two characters of the object's SHA-1 hash as the directory name, and the remaining 38 characters as the filename. For example, an object with the SHA-1 hash
d670460b4b4aece5915caf5c68d12f560a9fe3e4will be stored in.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4.
Object Generation During Commit Operations
Assuming you commit changes to a file, three objects are generated:
Blob Object: Stores the actual content of the file. For example, the content of a newly added or modified file is created and stored as a blob object.
Tree Object: Stores the directory structure and the SHA-1 hashes of the blob objects for all files within that directory, describing the tree structure of files and subdirectories.
Commit Object: Stores commit information, including the SHA-1 hash of the tree object, the SHA-1 hash of the previous commit, the commit message, and author/committer information. The HASH value seen in version control systems is this object.
For more detailed file content, please refer to the official documentation: "Git Internals - Git Objects".
refs
The filenames are branch or tag names, and the content is the HASH value of the current corresponding commit. Common folders include:
- heads: Stores local branches. If a branch name contains "/", a corresponding directory structure is created. For example, for "feature/requirement1", a "feature" folder is created, containing a file named "requirement1".
- remotes: Stores remote branches, using the remote repository name as the folder, such as "origin".
- tags: Stores the names of tags.
Files
COMMIT_EDITMSG
Records the content of the last commit. If you use git commit (without -m), git commit --amend, or edit a message during a conflict resolution process, this file will be opened for editing. Some GUI tools may provide a UI for editing instead of opening this file when executing these commands.
TIP
The content brought into git commit --amend is unrelated to this file; rather, the content of the previous commit is written into this file for editing.
config
Stores the Git configuration for the repository. This file is similar to .gitconfig but is primarily for repository-specific settings.
description
Used by the Git Web GUI to read the repository's description.
index
A binary file consisting of the repository file snapshot after the latest commit and the information of files added via git add.
HEAD
Stores the name of the currently checked-out branch or a specific commit. When the current HEAD points to a branch (e.g., main), it displays ref: refs/heads/main; when HEAD points to a specific commit, it stores the HASH value of that commit.
ORIG_HEAD
Stores the state of HEAD before destructive operations (such as git reset, git merge, etc.), used to restore to the previous state if necessary.
FETCH_HEAD
Marks the record of the last git fetch for each branch. The format per line is as follows:
{Commit SHA-1} [not-for-merge] branch '{branch name}' of {remote repository URL}Example:
3b3a827b86d264f9c81bc77ef6e0e3df5e302ae8 not-for-merge branch 'main' of http://127.0.0.1/wing/Project[not-for-merge]: Indicates that this node is not currently merged into the current branch. git pull is actually git fetch + git merge; if a merge behavior is triggered, this tag will not be included.
A Brief Discussion on Branches
From the Git structure described above, it is clear that branches and tags are simply objects pointing to specific commits. Tags point to fixed commit objects, while branches are updated with every commit. The branch graph starts from the commit object pointed to by the branch, traces back to the previous commit object recorded, and eventually produces the complete commit history structure.
The above was just to review the content from the Git course I took with Will Huang a few years ago; what follows is the main part where I start rambling.
From the perspective of a fantasy novel, the branch graph is like a known timeline, branches represent the current nodes, and tags are fixed historical coordinates. After each commit, if you use git reset or git rebase, the nodes before restoration are still stored in the "objects" folder. Each commit node in the branch graph is a determined past, while the nodes before git reset are possible futures. HEAD represents the current location in space-time. To travel through time, you can only see historical coordinates (tags) and known timelines (branch graphs); everything else must be queried via git reflog.
Change History
- Initial version created.
- Removed descriptions regarding the root-level ".gitconfig" as it cannot be tracked by version control.